92 research outputs found

    SEED: efficient clustering of next-generation sequences.

    Get PDF
    MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads.ResultsHere, we introduce SEED-an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60-85% and 21-41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12-27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.AvailabilityThe SEED software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/[email protected] informationSupplementary data are available at Bioinformatics online

    Mi-1-mediated resistance to Meloidogyne incognita in tomato may not rely on ethylene but hormone perception through ETR3 participates in limiting nematode infection in a susceptible host.

    Get PDF
    Root-knot nematodes, Meloidogyne spp., are important pests of tomato (Solanum lycopersicum) and resistance to the three most prevalent species of this genus, including Meloidogyne incognita, is mediated by the Mi-1 gene. Mi-1 encodes a nucleotide binding (NB) leucine-rich repeat (LRR) resistance (R) protein. Ethylene (ET) is required for the resistance mediated by a subset of NB-LRR proteins and its role in Mi-1-mediated nematode resistance has not been characterized. Infection of tomato roots with M. incognita differentially induces ET biosynthetic genes in both compatible and incompatible interactions. Analyzing the expression of members of the ET biosynthetic gene families ACC synthase (ACS) and ACC oxidase (ACO), in both compatible and incompatible interactions, shows differences in amplitude and temporal expression of both ACS and ACO genes in these two interactions. Since ET can promote both resistance and susceptibility against microbial pathogens in tomato, we investigated the role of ET in Mi-1-mediated resistance to M. incognita using both genetic and pharmacological approaches. Impairing ET biosynthesis or perception using virus-induced gene silencing (VIGS), the ET-insensitive Never ripe (Nr) mutant, or 1-methylcyclopropene (MCP) treatment, did not attenuate Mi-1-mediated resistance to M. incognita. However, Nr plants compromised in ET perception showed enhanced susceptibility to M. incognita indicating a role for ETR3 in basal resistance to root-knot nematodes

    Sequence Analysis of the Potato Aphid \u3cem\u3eMacrosiphum euphorbiae\u3c/em\u3e Transcriptome Identified Two New Viruses

    Get PDF
    The potato aphid, Macrosiphum euphorbiae, is an important agricultural pest that causes economic losses to potato and tomato production. To establish the transcriptome for this aphid, RNA-Seq libraries constructed from aphids maintained on tomato plants were used in Illumina sequencing generating 52.6 million 75±105 bp paired-end reads. The reads were assembled using Velvet/Oases software with SEED preprocessing resulting in 22,137 contigs with an N50 value of 2,003bp. After removal of contigs from tomato host origin, 20,254 contigs were annotated using BLASTx searches against the non-redundant protein database from the National Center for Biotechnology Information (NCBI) as well as IntereProScan. This identified matches for 74% of the potato aphid contigs. The highest ranking hits for over 12,700 contigs were against the related pea aphid, Acyrthosiphon pisum. Gene Ontology (GO) was used to classify the identified M. euphorbiae contigs into biological process, cellular component and molecular function. Among the contigs, sequences of microbial origin were identified. Sixty five contigs were from the aphid bacterial obligate endosymbiont Buchnera aphidicola origin and two contigs had amino acid similarities to viruses. The latter two were named Macrosiphum euphorbiae virus 2 (MeV-2) and Macrosiphum euphorbiae virus 3 (MeV-3). The highest sequence identity to MeV-2 had the Dysaphis plantaginea densovirus, while to MeV-3 is the Hubei sobemo-like virus 49. Characterization of MeV-2 and MeV-3 indicated that both are transmitted vertically from adult aphids to nymphs. MeV-2 peptides were detected in the aphid saliva and only MeV-2 and not MeV-3 nucleic acids were detected inside tomato leaves exposed to virus-infected aphids. However, MeV-2 nucleic acids did not persist in tomato leaf tissues, after clearing the plants from aphids, indicating that MeV-2 is likely an aphid virus
    corecore